Univariate models to find breeding values through regression fitted via expectation-maximization implemented in C++.
emRR(y, gen, df = 10, R2 = 0.5)
emBA(y, gen, df = 10, R2 = 0.5)
emBB(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBC(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBCpi(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBL(y, gen, R2 = 0.5, alpha = 0.02)
emEN(y, gen, R2 = 0.5, alpha = 0.02)
emDE(y, gen, R2 = 0.5)
emML(y, gen, D = NULL)
lasso(y, gen)emCV(y, gen, k = 5, n = 5, Pi = 0.75, alpha = 0.02,
df = 10, R2 = 0.5, avg=TRUE, llo=NULL, tbv=NULL, ReturnGebv = FALSE)
The EM functions returns a list with the intercept (\(mu\)), the regression coefficient (\(b\)), the fitted value (\(hat\)), and the estimated intraclass-correlation (\(h2\)).
The function emCV returns the predictive ability of each model, that is, the correlation between the predicted and observed values from \(k\)-fold cross-validations repeated \(n\) times.
Numeric vector of response variable (\(n\)). NA
is not allowed.
Numeric matrix containing the genotypic data. A matrix with \(n\) rows of observations and \(m\) columns of molecular markers.
Hyperprior degrees of freedom of variance components.
Expected R2, used to calculate the prior shape (de los Campos et al. 2013).
Value between 0 and 1. Expected probability pi of having null effect (or 1-Pi if Pi>0.5).
Value between 0 and 1. Intensity of L1 variable selection.
NULL or numeric vector with length p. Vector of weights for markers.
Integer. Folding of a k-fold cross-validation.
Integer. Number of cross-validation to perform.
Logical. Return average across CV, or correlations within CV.
NULL or a vector (numeric or factor) with the same length as y. If provided, the cross-validations are performed as Leave a Level Out (LLO). This argument allows the user to predefine the splits. This argument overrides k
and n
.
NULL or numeric vector of 'true breeding values' (\(n\)) to use to compare cross-validations to. If NULL, the cross-validations will have the phenotypes as prediction target.
Logical. If TRUE, it returns a list with the average marker values and fitted values across all cross-validations, in addition to the regular output.
Alencar Xavier
The model for the whole-genome regression is as follows:
$$y = mu + Xb + e$$
where \(y\) is the response variable, \(mu\) is the intercept, \(X\) is the genotypic matrix, \(b\) is the effect of an allele substitution (or regression coefficient) and \(e\) is the residual term. A k-fold cross-validation for model evaluation is provided by \(emCV\).
if (FALSE) {
data(tpod)
emCV(y,gen,3,3)
}
Run the code above in your browser using DataLab